bonsoon's blog |
| latest | about | random
# Expected length of a binary pattern occurring in a run. Suppose we have a fair coin that gives either head (H) or tail (T) with equal probability, and we fix a finite heads-tail sequence $X=X_{1}X_{2}\cdots X_{k}$ of length $k$, where $X_{i} \in \{H,T\}$, what is the expected length of tosses of this fair coin until we see the pattern $X$? For example if the sequence is $HH$, then what is the expected number of tosses in a row until we finally see a consecutive $HH$? How about $HT$? Do you have a guess which pattern would we expect to see first? Let us compute this. Let $x = E(L)$ to be the expected length of seeing $HH$. Now, if the first toss is $T$, then essentially we restart the process, but with one additional toss added. This happens half the time, so we have $$ x = E(L) = \frac{1}{2} (1 + x) + \cdots $$For the other half of the time, we start with the toss $H$. Now if the following toss is $H$, then we win, but if the following toss is $T$, then it essentially restarts the process. So $$ E(L) = \frac{1}{2} (1 + x) + \frac{1}{2}\left( \frac{1}{2} \cdot 2 + \frac{1}{2}(2+ x)\right) $$Solving for $E(L)$ gives $$ E(L) = 6. $$ What about $HT$? Again let us set $x = E(L)$. Suppose the first toss is $T$, then this restarts the waiting process, so we have $$ x = E(L) = \frac{1}{2} (1 + x) + \cdots $$And if the first toss is $H$, two things happens afterwards. Either we follow up with $T$, which then we win, or we toss $H$, which again we will need to wait for $T$ again. So given the first toss is a head, we now become a geometric distribution $Y\sim Geom(\frac{1}{2})$ waiting for a $T$. Hence $$ x = E(L) = \frac{1}{2}(1+x) + \frac{1}{2}(1 + E(Y)). $$Since $E(Y) = 2$, we can solve for $x = E(L)$, giving us $$ E(L) = 4. $$ This is interesting! As these two patterns have unequal expected waiting lengths! Let us populate these first few patterns and their expected waiting lengths: $$ \begin{array}{} \text{pattern} & \text{expected length} \\ \hline H & 2 \\ T & 2 \\ HH & 6 \\ HT & 4 \\ TH & 4 \\ TT & 6 \end{array} $$We can get the other ones by using symmetry. And the expected length of a length one pattern is just the expected value of a geometric variable $Y \sim Geom(\frac{1}{2})$. This gives rise to an interesting question. Given some pattern $X$, what is $f(X)$ the expected waiting length of seeing pattern $X$? We can employ the same method as we had to compute these. For example, let us compute $x = f(HHH)$. If the first toss is $T$, then we end up restarting the process, so we have $$ x = f(HHH) = \frac{1}{2}(1+x) + \cdots $$ If instead we start with $HT$, then we also restart the process. So we have $$ x = f(HHH) = \frac{1}{2} (1 + x) + \frac{1}{4} (2 + x) + \cdots $$ If instead we start with $HH$, it leads to $HHH$ or $HHT$. The first case we win, while the second case we restart. So we have $$ x = f(HHH) = \frac{1}{2} (1 + x) + \frac{1}{4}(2 + x) + \frac{1}{8}(3) + \frac{1}{8}( 3 + x) $$ Solving for $x = f(HHH)$ gives $$ f(HHH) = 14. $$ ## A graphical analysis. Suppose we a pattern $HTHHT$ that we want to find the expected length to see this pattern from fair coin tosses. We construct this binary tree ``` x / \ H(y) x / \ y T(z) / \ H x / \ H z / \ y T ``` This shows that if we set $x = E(L)$ the expected length of the run to see the pattern $HTHHT$, we have the system of equation $$ \begin{align*} x &= \frac{1}{2}(1+y) + \frac{1}{2}(1+x) \\ y &= \frac{1}{2}(1+y) + \frac{1}{2}(1+z) \\ z &= \frac{1}{2}(1+x) + \frac{1}{4}(2+z) + \frac{1}{8}(3) + \frac{1}{8}(3+y) \end{align*} $$ This gives $x = E(L) = 36$. Another example. Suppose the pattern is $HTHTH$, then we have the following binary tree instead. ``` x / \ H(y) x / \ y T / \ H x / \ y T / \ H x ``` Which gives the following system of equations $$ \begin{align*} x &= \frac{1}{2}(1+y) + \frac{1}{2}(1+x) \\ y &= \frac{1}{2}(1+y) + \frac{1}{4}(2+x) + \frac{1}{8}(3 + y) + \frac{1}{16}(4) + \frac{1}{16}(4+x) \end{align*} $$ This gives $x = E(L) = 42$. A Python3 code to compute this, with the help of SymPy to solve a linear system is given below: ``` def matching(chk, tar): temp = chk done = tar.startswith(temp) while not done: temp = temp[1:] done = tar.startswith(temp) return len(temp) def seq(pattern): length = len(pattern) ret = [] for i in range(length): wh = pattern[:i] + 'H' wt = pattern[:i] + 'T' ret.append([matching(wh,pattern), matching(wt,pattern)]) return ret from sympy.matrices import Matrix, eye def expected_length(pattern): arr = [] b = [] size = len(pattern) for v in seq(pattern): arr += [0+(i in v) for i in range(size)] b += [1] b = Matrix(size,1,b) return int((eye(size) - 1/2 * Matrix(size,size,arr)).LUsolve(b)[0]) # This function computes the expected length to see a pattern in a coin toss run expected_length('HTTH') # 18 expected_length('HTTTHHTHH') # 514 ``` Is there a better way?